Advances in CNNs
Transfer Learning
Let's code! TF + transfer learning
"Fully Convolutional" Neural Networks
Let's code! TF + transfer learning + FCNNs
Tutorial: Google Colab GPU-enhanced runtime
Tutorial: Machine Learning Experiments
Where we left off...

We saw that convolution with machine-learned filters was the leading method for solving computer vision classification problems.

AlexNet 2012

Contribution:
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems 25 (2012): 1097-1105.
VGG-16, -19 2014

Contribution:
Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition
Inception (GoogLeNet) 2014
| Inception cell | Inception Architecture |
|---|---|
![]() |
![]() |
Contribution:
Szegedy, C., Lui, W., Jia, Y., et al. (2014) Going Deeper with Convolutions.
ResNet 2015
| ResNet cell | ResNet Architecture |
|---|---|
![]() |
![]() |
Contribution:
He, K., Zhang, X., Ren, S., and Sun, J. (2015) Deep Residual Learning for Image Recognition.
Aside: What is batch normalisation?
Batch normalisation layers are put after convolutional layers to normalise each batch during training to a given mean and variance. The subsequent layer always receives, as input, data with the same $\mu$ and $\sigma$. This has the effect of:
ResNeXt 2017

Contribution:
Xie, S., Girshick, R., Dollár, P., Tu, Z., and He, K. (2017) Aggregated Residual Transformations for Deep Neural Networks.
DenseNet 2018
| DenseNet cell | DenseNet Architecture |
|---|---|
![]() |
![]() |
Contribution:
Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2018) Densely Connected Convolutional Networks.
These networks represent a fairly stable paradigm. Iterations then turned to hyperparameter tuning:
AmoebaNet 2019

Contribution:
Real, E., Aggarwal, A., Huang, Y., & Le, Q. V. (2019) Regularized Evolution for Image Classifier Architecture Search.
EfficientNet 2020

Contribution:
Tan, M., & Le, Q. V. (2020) EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Performance comparison:

finally...
Transformers!
ViT 2020

Contribution:
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2020) An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
That's all well and good, but how can I use any of this with my own data? I only have 10 samples???
10 100?
100 1,000?
1,000 10,000?
With transfer learning, we pretrain a model in one domain, and then transfer the model's knowledge of that domain to another. Finetuning in the new domain leads to a model with improved generaliseability and performance.


We typically do this when we have a shortage of data in our target domain (e.g. labelled images of dog breeds), but a surplus of images in a similar domain (e.g. labelled images of animal types). We can transfer a model's learned knowledge of animal types to the problem of predicting dog breeds.
In practise, you will almost always start with a pretrained model - you wouldn't want to train VGG-19 yourself.
This means that you can begin many problems from the baseline of a model trained for many days by Microsoft, Google, or Oxford academics.
Very common practise with applications in computer vision and NLP.
There are many domain transfer and training tricks, but let's leave those for now.
Let's return to our Flowers problem. We had a dataset of 3,670 real pictures of flowers classified into one of five categories. With a basic AlexNet we quickly acheived a classifier accuracy of ~50%. Let's see if using transfer learning improves our outcome.
import os, sys, glob # some built-ins
from random import shuffle # shuffle a list of elements in-place
from PIL import Image # image manipulation
import requests # http requests
import matplotlib.pyplot as plt # visualisation
import numpy as np # data maniputlations
from scipy.signal import convolve2d # to demo convolution
from sklearn.metrics import confusion_matrix
from skimage.io import imread # read an image to a np array
from skimage.transform import resize # resize an image
from skimage.util import crop, pad # crop or pad an image
import tensorflow as tf
import tensorflow_datasets as tfds # built-in MNIST
tf.config.list_physical_devices() # let's check whether TF is GPU-ready
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
### set a root directory for managing paths
root = os.path.abspath(os.path.join(os.getcwd(),'..'))
### You can re-download the flowers data if you don't have it from the previous lecture
!wget -c https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz -O - | tar -xz -C {root}/data
### first, let's get all the image records
records = [{
'flower':f.split('/')[-2],
'path':f
} for f in glob.glob(os.path.join(root,'data','flower_photos','**','*.jpg'))]
print ('N records:',len(records))
print(records[0])
N records: 3670
{'flower': 'roses', 'path': '/home/jupyter/ox-sbs-ml-bd/data/flower_photos/roses/269037241_07fceff56a_m.jpg'}
shuffle(records)
### visualise our data
fig, axs = plt.subplots(4,6,figsize=(20,12))
axs=axs.flatten()
for ii_r,r in enumerate(records[0:24]):
arr = imread(r['path'])
axs[ii_r].imshow(arr)
axs[ii_r].set_title(r['flower'])
plt.show()
We'll use our generator function again:
def flowers_generator(records, output_shape=(200,200), mode='random_crop'):
### a wrapper for our generator. Takes all our parameters and returns the generator.
# one-hot encode our classes
mapper = {'dandelion': 0, 'sunflowers': 1, 'daisy': 2, 'tulips': 3, 'roses': 4}
def _generator():
### The internal generator must not take any parameters.
for r in records:
# io
x = (imread(r['path'])).astype(np.float32) # <- CHANGE HERE! Don't normalise.
y = np.array(mapper[r['flower']]).astype(np.float32)
# reduce dimension of array
if mode=='resize':
x = resize(x,output_shape)
elif mode=='random_crop':
crop_width = [(0,0)]*3
pad_width = [(0,0)]*3
for ax in [0,1]:
if x.shape[ax]>output_shape[ax]:
crop_val=np.random.choice(x.shape[ax]-output_shape[ax])
crop_width[ax] = (crop_val, x.shape[ax]-output_shape[ax]-crop_val)
elif x.shape[ax]<output_shape[ax]:
pad_val = np.random.choice(output_shape[ax]-x.shape[ax])
pad_width[ax] = (pad_val,output_shape[ax]- x.shape[ax]-pad_val)
x = crop(x, crop_width)
x = pad(x, pad_width)
yield tf.convert_to_tensor(x), tf.convert_to_tensor(y)
return _generator
trn_split=0.7
val_split=0.9
generator_obj_trn = flowers_generator(
records[0:int(trn_split*len(records))],
output_shape=(200,200),
mode='resize'
)
generator_obj_val = flowers_generator(
records[int(trn_split*len(records)):int(val_split*len(records))],
output_shape=(200,200),
mode='resize'
)
ds_flowers_trn = (
tf.data.Dataset.from_generator(
generator_obj_trn,
output_signature=(
tf.TensorSpec(shape=(200,200,3), dtype=tf.float32),
tf.TensorSpec(shape=(), dtype=tf.float32)))
) \
.cache().batch(128).prefetch(tf.data.experimental.AUTOTUNE)
ds_flowers_val = (
tf.data.Dataset.from_generator(
generator_obj_val,
output_signature=(
tf.TensorSpec(shape=(200,200,3), dtype=tf.float32),
tf.TensorSpec(shape=(), dtype=tf.float32)))
) \
.cache().batch(128).prefetch(tf.data.experimental.AUTOTUNE)
Both TF and PyTorch have built-in libraries for importing pretrained models. Let's look at the documentation for importing a pre-trained VGG-16 model.
model = tf.keras.applications.VGG16(
include_top=True,
weights='imagenet',
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation='softmax'
)
And we also have this warning:
Note: each Keras Application expects a specific kind of input preprocessing. For VGG16, call tf.keras.applications.vgg16.preprocess_input on your inputs before passing them to the model. vgg16.preprocess_input will convert the input images from RGB to BGR, then will zero-center each color channel with respect to the ImageNet dataset, without scaling.
Let's break down each of these options.
include top: like many CNNs, VGG-16 has several fully-connected layers after the convolutional layers. If we pass false to this parameter, the VGG model will be instantiated without the FC layers, which we might want for other downstream tasks.
weights: this option allows us to instantiate a VGG-16 model with pretrained weights. Exactly what we want for transfer learning!
input_tensor, input_shape: specify a different input shape for the model if not using the pretrained fully-connected layer.
pooling: if not using the top FC layers, we can specify a different pooling function on the convolutional output, if we want.
classes: if not using ImageNet weights, we can instantiate the the VGG network with a different number of output classes.
classifier_activiation: if we want to use a different activation function (e.g. sigmoid for multi-label classification), we can specify that here.
We want to use ImageNet-pretrained weights, but we only have 5 classes. We'll want to drop the top FC layers and use maxpooling to flatten the output. Then we'll add our own fully connected layers. We can also use an input size of (200,200,3) to match our previous generator (because we've removed the FC header).
Per the warning, we'll also need to normalise our data in the same way that VGG16 did. Fortunately keras has a built in preprocessor for this.
def vgg16_premapper(_x, _y): # sample and target are now tf tensors
return tf.keras.applications.vgg16.preprocess_input(_x), _y # return the (image, label) tuple
ds_flowers_trn = ds_flowers_trn.map(vgg16_premapper, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_flowers_val = ds_flowers_val.map(vgg16_premapper, num_parallel_calls=tf.data.experimental.AUTOTUNE)
Let's check the shape of our data.
a,b = next(ds_flowers_trn.as_numpy_iterator())
a.shape, a.max(), a.min(), b.shape, b.max(), b.min()
((128, 200, 200, 3), 151.061, -123.68, (128,), 4.0, 0.0)
model_vgg = tf.keras.applications.VGG16(
include_top=False,
input_shape=(200,200,3),
weights='imagenet',
pooling='max',
)
model_vgg.summary()
Model: "vgg16" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 200, 200, 3)] 0 _________________________________________________________________ block1_conv1 (Conv2D) (None, 200, 200, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 200, 200, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 100, 100, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 100, 100, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 100, 100, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 50, 50, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 50, 50, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 50, 50, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 50, 50, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 25, 25, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 25, 25, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 25, 25, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 25, 25, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 12, 12, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 12, 12, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 12, 12, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 12, 12, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 6, 6, 512) 0 _________________________________________________________________ global_max_pooling2d (Global (None, 512) 0 ================================================================= Total params: 14,714,688 Trainable params: 14,714,688 Non-trainable params: 0 _________________________________________________________________
# We don't have enough data to retrain all of VGG. Let's make our vgg model non-trainable.
model_vgg.trainable=False
# now make a new model using the nested VGG model
model = tf.keras.Sequential([
model_vgg,
tf.keras.layers.Dense(512),
tf.keras.layers.Dropout(0.5), # Add a little bit of regularlisation
tf.keras.layers.Dense(512),
tf.keras.layers.Dropout(0.5), # Add a little bit of regularlisation
tf.keras.layers.Dense(5),
], name='flowers_vgg_classifier')
model.summary()
Model: "flowers_vgg_classifier" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= vgg16 (Functional) (None, 512) 14714688 _________________________________________________________________ dense (Dense) (None, 512) 262656 _________________________________________________________________ dropout (Dropout) (None, 512) 0 _________________________________________________________________ dense_1 (Dense) (None, 512) 262656 _________________________________________________________________ dropout_1 (Dropout) (None, 512) 0 _________________________________________________________________ dense_2 (Dense) (None, 5) 2565 ================================================================= Total params: 15,242,565 Trainable params: 527,877 Non-trainable params: 14,714,688 _________________________________________________________________
### or, if you want to see all the layers:
model = tf.keras.Sequential(
[L for L in model_vgg.layers] +
[
tf.keras.layers.Dense(512),
tf.keras.layers.Dropout(0.5), # Add a little bit of regularlisation
tf.keras.layers.Dense(512),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(5),
],
name='flowers_vggext_classifier'
)
# a bit more verbose
model.summary()
Model: "flowers_vggext_classifier" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= block1_conv1 (Conv2D) (None, 200, 200, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 200, 200, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 100, 100, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 100, 100, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 100, 100, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 50, 50, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 50, 50, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 50, 50, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 50, 50, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 25, 25, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 25, 25, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 25, 25, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 25, 25, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 12, 12, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 12, 12, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 12, 12, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 12, 12, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 6, 6, 512) 0 _________________________________________________________________ global_max_pooling2d (Global (None, 512) 0 _________________________________________________________________ dense_3 (Dense) (None, 512) 262656 _________________________________________________________________ dropout_2 (Dropout) (None, 512) 0 _________________________________________________________________ dense_4 (Dense) (None, 512) 262656 _________________________________________________________________ dropout_3 (Dropout) (None, 512) 0 _________________________________________________________________ dense_5 (Dense) (None, 5) 2565 ================================================================= Total params: 15,242,565 Trainable params: 527,877 Non-trainable params: 14,714,688 _________________________________________________________________
## This pattern also works:
x = model_vgg.output
x = tf.keras.layers.Dense(512)(x)
x = tf.keras.layers.Dropout(0.5)(x)
x = tf.keras.layers.Dense(512)(x)
x = tf.keras.layers.Dropout(0.5)(x)
x = tf.keras.layers.Dense(5)(x)
model = tf.keras.Model(model_vgg.input, x)
See Keras documentation for more.
Let's train!
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)
model.fit(
ds_flowers_trn,
epochs=20,
validation_data=ds_flowers_val,
)
Epoch 1/20 21/21 [==============================] - 168s 6s/step - loss: 56.8317 - sparse_categorical_accuracy: 0.4769 - val_loss: 16.5332 - val_sparse_categorical_accuracy: 0.8025 Epoch 2/20 21/21 [==============================] - 20s 932ms/step - loss: 18.6563 - sparse_categorical_accuracy: 0.7666 - val_loss: 16.4549 - val_sparse_categorical_accuracy: 0.7916 Epoch 3/20 21/21 [==============================] - 20s 932ms/step - loss: 18.6675 - sparse_categorical_accuracy: 0.7764 - val_loss: 12.4930 - val_sparse_categorical_accuracy: 0.8297 Epoch 4/20 21/21 [==============================] - 20s 928ms/step - loss: 13.8148 - sparse_categorical_accuracy: 0.7971 - val_loss: 13.8022 - val_sparse_categorical_accuracy: 0.8311 Epoch 5/20 21/21 [==============================] - 20s 936ms/step - loss: 14.7291 - sparse_categorical_accuracy: 0.8192 - val_loss: 14.2620 - val_sparse_categorical_accuracy: 0.8365 Epoch 6/20 21/21 [==============================] - 20s 931ms/step - loss: 10.2870 - sparse_categorical_accuracy: 0.8343 - val_loss: 12.0203 - val_sparse_categorical_accuracy: 0.8460 Epoch 7/20 21/21 [==============================] - 20s 936ms/step - loss: 6.7719 - sparse_categorical_accuracy: 0.8728 - val_loss: 11.9677 - val_sparse_categorical_accuracy: 0.8365 Epoch 8/20 21/21 [==============================] - 20s 929ms/step - loss: 8.3594 - sparse_categorical_accuracy: 0.8512 - val_loss: 11.9430 - val_sparse_categorical_accuracy: 0.8474 Epoch 9/20 21/21 [==============================] - 20s 935ms/step - loss: 8.1771 - sparse_categorical_accuracy: 0.8594 - val_loss: 13.8571 - val_sparse_categorical_accuracy: 0.8447 Epoch 10/20 21/21 [==============================] - 20s 929ms/step - loss: 8.1539 - sparse_categorical_accuracy: 0.8507 - val_loss: 14.9796 - val_sparse_categorical_accuracy: 0.8283 Epoch 11/20 21/21 [==============================] - 20s 934ms/step - loss: 6.9935 - sparse_categorical_accuracy: 0.8831 - val_loss: 14.3296 - val_sparse_categorical_accuracy: 0.8338 Epoch 12/20 21/21 [==============================] - 20s 930ms/step - loss: 6.3539 - sparse_categorical_accuracy: 0.8912 - val_loss: 18.9413 - val_sparse_categorical_accuracy: 0.8120 Epoch 13/20 21/21 [==============================] - 20s 934ms/step - loss: 7.4660 - sparse_categorical_accuracy: 0.8772 - val_loss: 12.7482 - val_sparse_categorical_accuracy: 0.8501 Epoch 14/20 21/21 [==============================] - 20s 935ms/step - loss: 5.9989 - sparse_categorical_accuracy: 0.8979 - val_loss: 14.7906 - val_sparse_categorical_accuracy: 0.8283 Epoch 15/20 21/21 [==============================] - 20s 935ms/step - loss: 6.3284 - sparse_categorical_accuracy: 0.8947 - val_loss: 12.0919 - val_sparse_categorical_accuracy: 0.8597 Epoch 16/20 21/21 [==============================] - 20s 934ms/step - loss: 5.1112 - sparse_categorical_accuracy: 0.9164 - val_loss: 12.6545 - val_sparse_categorical_accuracy: 0.8569 Epoch 17/20 21/21 [==============================] - 20s 934ms/step - loss: 4.6218 - sparse_categorical_accuracy: 0.9100 - val_loss: 15.3938 - val_sparse_categorical_accuracy: 0.8270 Epoch 18/20 21/21 [==============================] - 20s 931ms/step - loss: 3.9294 - sparse_categorical_accuracy: 0.9129 - val_loss: 13.7545 - val_sparse_categorical_accuracy: 0.8460 Epoch 19/20 21/21 [==============================] - 20s 934ms/step - loss: 4.8524 - sparse_categorical_accuracy: 0.9127 - val_loss: 13.9475 - val_sparse_categorical_accuracy: 0.8542 Epoch 20/20 21/21 [==============================] - 20s 931ms/step - loss: 4.6092 - sparse_categorical_accuracy: 0.9190 - val_loss: 13.7851 - val_sparse_categorical_accuracy: 0.8351
<tensorflow.python.keras.callbacks.History at 0x7f9b3c586ad0>
85% accuracy after ~20 epochs. Not bad!

What if we want to see which pixels correspond to each class? We want an end-to-end learning system for which the output has the same pixel dimensionality as the input. Why might this be interesting?
We want an architecture that gives a pixel-wise classification.
We want to do something to increase the dimension of our data. What can we do? Upsample and convolve!

aka: transposed convolution, fractionally-strided convolution, "deconvolution"
[image ref: Vincent Dumoulin, Francesco Visin]
As we do this we'll also probably want to decrease the channel dimension of our data, mirroring what we did during convolution.
Fully Convolutional Net 2014

contribution:
Shelhamer, E., Long, J., & Darrell, T. (2016) Fully Convolutional Networks for Semantic Segmentation.
U-Net 2015

contribution:
Ronneberger, O., Philipp Fischer, P., & Thomas Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation.
More recent contributions overcome the fundamental spatial-semantic tradeoff
DeepLab 2017
| Atruous Convolutions | Atruous Spatial Pyramid Pooling |
|---|---|
![]() |
![]() |
contribution
Chen, L., George Papandreou, G., Iasonas Kokkinos, I., Murphy, K., & Yuille, A.L. (2017) DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.
HRNet 2020

contribution:
Wang, J., Sun, K., Cheng, T et al. (2020) Deep High-Resolution Representation Learning for Visual Recognition
Want more info, applications? A nice up-to-date blog post
Now let's use transfer learning and transpose convolution to make an FCNN.
We'll need a new dataset, one that has segmentation labels. We also don't want one that's too big, so we can use it for this demo.
Let's use PASCAL VOC (Visual Object Classes). From Oxford, original dataset used for computer vision challenges 2005-2012.
Warning: untaring within a GCP notebook environment has unstable behaviour.
### Let's download our data using wget same as we did for Flowers. This is a bigger dataset (~2gb) so be warned!
!wget -c http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar -O {root}/data/voc2012.tar
# in case robots.ox.ac.uk is broken I've mirrored it on GCP:
!wget -c https://storage.googleapis.com/voc-mirror/VOCtrainval_11-May-2012.tar -O {root}/data/voc2012.tar
--2021-06-07 15:14:33-- https://data.deepai.org/PascalVOC2012.zip Resolving data.deepai.org (data.deepai.org)... 138.201.36.183 Connecting to data.deepai.org (data.deepai.org)|138.201.36.183|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 3899239928 (3.6G) [application/x-zip-compressed] Saving to: ‘/home/jupyter/ox-sbs-ml-bd/data/voc2012.zip’ /home/jupyter/ox-sb 100%[===================>] 3.63G 26.7MB/s in 2m 23s 2021-06-07 15:16:56 (26.0 MB/s) - ‘/home/jupyter/ox-sbs-ml-bd/data/voc2012.zip’ saved [3899239928/3899239928]
### Now let's untar our data. (note: specify the directory with -C, and untar options with -x (extract),-v (verbose),-f (pass filename),-z (gzip))
!tar -C {root}/data -xvf {root}/data/voc2012.tar
### Cleanup - remove our original .tar file.
!rm {root}/data/voc2012.tar
Alternative for unstable untar behaviour:
(use this if you are in a Jupyter Lab environment and untar is giving you memory spikes)
### use the google storage api to download the files directly.
from google.cloud import storage
storage_client = storage.Client()
bucket = storage_client.bucket('voc-mirror')
blobs=[b for b in bucket.list_blobs(prefix='VOCdevkit/')]
for ii_b, b in enumerate(blobs):
if ii_b %100==0:
print (ii_b, ii_b/len(blobs))
savepath = os.path.join(root,'data',b.name)
if not os.path.exists(os.path.split(savepath)[0]):
os.makedirs(os.path.split(savepath)[0])
b.download_to_filename(savepath)
VOC 2012 conveniently includes a list of unique ids with image and segmentation data. Let's read that list and then set up a list of dict with image and annotation paths. (Why a list? It's a basic Python Type and is easy to shuffle)
# read in unique idxs from 'trainval.txt'. We'll split training and validation later.
with open(os.path.join(root,'data','VOCdevkit','VOC2012','ImageSets','Segmentation','trainval.txt'),'r') as f:
idxs = [line.strip() for line in f.readlines()] # strip any line breaks, returns, white space
print (idxs[0], idxs[-1])
2007_000032 2011_003271
# make a records list of dicts
records = [
{
'image':os.path.join(root,'data','VOCdevkit','VOC2012','JPEGImages',idx+'.jpg'),
'annotation':os.path.join(root,'data','VOCdevkit','VOC2012','SegmentationClass',idx+'.png'),
}
for idx in idxs
]
As usual, we want to inspect our data to make sure we know what it contains, how to open it, develop intuition about it, etc.
# randomly shuffle our records
shuffle(records)
### visualise our data
fig, axs = plt.subplots(4,6,figsize=(20,12))
axs=axs.flatten()
for ii_r,r in enumerate(records[0:12]):
img = imread(r['image'])
ann = imread(r['annotation'])
axs[2*ii_r].imshow(img)
axs[2*ii_r+1].imshow(ann)
plt.show()
imread(records[0]['annotation']).shape
(500, 333, 4)
Ah. So all our annotations are in 4d (RGBA) pngs! That's annoying!
As before, our generator will need to load our data and preprocess it. In this case, it will also need to load our annotations as images, and then convert them to a mask that we can use as a training target.
### We can conveniently find the colormap that VOC2012 uses to label its objects: https://albumentations.ai/docs/autoalbument/examples/pascal_voc/
VOC_COLORMAP = {
"background": [0, 0, 0],
"aeroplane": [128, 0, 0],
"bicycle": [0, 128, 0],
"bird": [128, 128, 0],
"boat": [0, 0, 128],
"bottle": [128, 0, 128],
"bus": [0, 128, 128],
"car": [128, 128, 128],
"cat": [64, 0, 0],
"chair": [192, 0, 0],
"cow": [64, 128, 0],
"diningtable": [192, 128, 0],
"dog": [64, 0, 128],
"horse": [192, 0, 128],
"motorbike": [64, 128, 128],
"person": [192, 128, 128],
"potted plant": [0, 64, 0],
"sheep": [128, 64, 0],
"sofa": [0, 192, 0],
"train": [128, 192, 0],
"tv/monitor": [0, 64, 128],
}
### let's use our colormap to make a mask-generating function
def get_mask(image):
# image: an 3d (WxHxRGBA) numpy array of the annotation image
height, width = image.shape[:2]
segmentation_mask = np.zeros((height, width, len(VOC_COLORMAP.keys())), dtype=np.float32)
for label_index, (key, rgb_value) in enumerate(VOC_COLORMAP.items()):
segmentation_mask[:, :, label_index] = np.all(image == rgb_value, axis=-1).astype(float)
return segmentation_mask
### let's inspect our masks to make sure they're generating properly
for ii in np.random.choice(len(records),3):
fig, axs = plt.subplots(1,3,figsize=(9,3))
img = imread(records[ii]['image'])
ann = imread(records[ii]['annotation'])
mask = get_mask(ann[:,:,0:3]) # need to drop the last channel (the alpha/transparency channel)
axs[0].imshow(img)
axs[1].imshow(ann)
axs[2].imshow(mask.argmax(axis=-1), vmax=21, vmin=0) # use argmax to revert from one-hot to logit encoding
print (f'image {ii}:',[{f'class: {jj}',f'label: {list(VOC_COLORMAP.keys())[jj]}'} for jj in np.unique(mask.argmax(axis=-1))])
plt.show()
image 358: [{'class: 0', 'label: background'}, {'label: sheep', 'class: 17'}]
image 2882: [{'class: 0', 'label: background'}, {'label: car', 'class: 7'}, {'class: 8', 'label: cat'}]
image 2584: [{'class: 0', 'label: background'}, {'class: 8', 'label: cat'}]
def voc2012_generator(records, output_shape=(200,200), mode='random_crop'):
### a wrapper for our generator. Takes all our parameters and returns the generator.
def _generator():
### The internal generator must not take any parameters.
for r in records:
# io
x = (imread(r['image'])).astype(np.float32) # <- again, don't normalise.
ann = imread(r['annotation'])[:,:,0:3] # drop the alpha channel
y = get_mask(ann) # WHC, float32
# reduce dimension of array
if mode=='resize':
x = resize(x,output_shape)
y = resize(y,output_shape)
elif mode=='random_crop':
crop_width = [(0,0)]*3
pad_width = [(0,0)]*3
for ax in [0,1]:
if x.shape[ax]>output_shape[ax]:
crop_val=np.random.choice(x.shape[ax]-output_shape[ax])
crop_width[ax] = (crop_val, x.shape[ax]-output_shape[ax]-crop_val)
elif x.shape[ax]<output_shape[ax]:
pad_val = np.random.choice(output_shape[ax]-x.shape[ax])
pad_width[ax] = (pad_val,output_shape[ax]- x.shape[ax]-pad_val)
x = crop(x, crop_width)
x = pad(x, pad_width)
y = crop(y, crop_width)
y = pad(y, pad_width)
yield tf.convert_to_tensor(x), tf.convert_to_tensor(y)
return _generator
trn_split=0.7
val_split=0.9
generator_obj_trn = voc2012_generator(
records[0:int(trn_split*len(records))],
output_shape=(224,224),
mode='resize'
)
generator_obj_val = voc2012_generator(
records[int(trn_split*len(records)):int(val_split*len(records))],
output_shape=(224,224),
mode='resize'
)
generator_obj_test = voc2012_generator(
records[int(val_split*len(records)):],
output_shape=(224,224),
mode='resize'
)
ds_voc_trn = (
tf.data.Dataset.from_generator(
generator_obj_trn,
output_signature=(
tf.TensorSpec(shape=(224,224,3), dtype=tf.float32),
tf.TensorSpec(shape=(224,224,21), dtype=tf.float32))) # <- new shape!
) \
.cache().batch(64).prefetch(tf.data.experimental.AUTOTUNE)
ds_voc_val = (
tf.data.Dataset.from_generator(
generator_obj_val,
output_signature=(
tf.TensorSpec(shape=(224,224,3), dtype=tf.float32),
tf.TensorSpec(shape=(224,224,21), dtype=tf.float32))) # <- new shape
) \
.cache().batch(64).prefetch(tf.data.experimental.AUTOTUNE)
ds_voc_test = (
tf.data.Dataset.from_generator(
generator_obj_test,
output_signature=(
tf.TensorSpec(shape=(224,224,3), dtype=tf.float32),
tf.TensorSpec(shape=(224,224,21), dtype=tf.float32))) # <- new shape
) \
.cache().batch(64).prefetch(tf.data.experimental.AUTOTUNE)
ds_voc_trn = ds_voc_trn.map(vgg16_premapper, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_voc_val = ds_voc_val.map(vgg16_premapper, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_voc_test = ds_voc_test.map(vgg16_premapper, num_parallel_calls=tf.data.experimental.AUTOTUNE)
Let's check the shape of our data to confirm we're getting what we expect.
a,b = next(ds_voc_trn.as_numpy_iterator())
a.shape, a.max(), a.min(), b.shape, b.max(), b.min()
((64, 224, 224, 3), 151.061, -123.68, (64, 224, 224, 21), 1.0, 0.0)
Let's build a fully convolutional neural network for our semantic segmentation problem. We'll want a pretrained encoder and then we'll train a decoder using our new data. We'll add a header to do the final mapping to the output classes.
vgg_encoder = tf.keras.applications.VGG16(
include_top=False,
input_shape=(224,224,3),
weights='imagenet',
pooling=None, # <- in this case, we don't want any pooling on our final outputs
)
# as before, let's make our encoder not trainable.
vgg_encoder.trainable = False
vgg_encoder.summary()
Model: "vgg16" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 224, 224, 3)] 0 _________________________________________________________________ block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 112, 112, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 112, 112, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 7, 7, 512) 0 ================================================================= Total params: 14,714,688 Trainable params: 0 Non-trainable params: 14,714,688 _________________________________________________________________
# make a function to build our upblocks flexible to the number of filters
def UpBlock(n_filters, n_blocks):
block_layers = []
for _ in range(n_blocks):
block_layers.append(tf.keras.layers.Activation('relu'))
block_layers.append(tf.keras.layers.Conv2DTranspose(n_filters, 3, padding="same"))
block_layers.append(tf.keras.layers.BatchNormalization())
# and add the upsampling layer
block_layers.append(tf.keras.layers.UpSampling2D(2))
return block_layers
decoder = tf.keras.models.Sequential(
[tf.keras.layers.UpSampling2D(2, input_shape=(7,7,512))]
+ UpBlock(n_filters=256,n_blocks=1)
+ UpBlock(128,1)
+ UpBlock(64,1)
+ UpBlock(32,1),
name='decoder'
)
decoder.summary()
Model: "decoder" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= up_sampling2d (UpSampling2D) (None, 14, 14, 512) 0 _________________________________________________________________ activation (Activation) (None, 14, 14, 512) 0 _________________________________________________________________ conv2d_transpose (Conv2DTran (None, 14, 14, 256) 1179904 _________________________________________________________________ batch_normalization (BatchNo (None, 14, 14, 256) 1024 _________________________________________________________________ up_sampling2d_1 (UpSampling2 (None, 28, 28, 256) 0 _________________________________________________________________ activation_1 (Activation) (None, 28, 28, 256) 0 _________________________________________________________________ conv2d_transpose_1 (Conv2DTr (None, 28, 28, 128) 295040 _________________________________________________________________ batch_normalization_1 (Batch (None, 28, 28, 128) 512 _________________________________________________________________ up_sampling2d_2 (UpSampling2 (None, 56, 56, 128) 0 _________________________________________________________________ activation_2 (Activation) (None, 56, 56, 128) 0 _________________________________________________________________ conv2d_transpose_2 (Conv2DTr (None, 56, 56, 64) 73792 _________________________________________________________________ batch_normalization_2 (Batch (None, 56, 56, 64) 256 _________________________________________________________________ up_sampling2d_3 (UpSampling2 (None, 112, 112, 64) 0 _________________________________________________________________ activation_3 (Activation) (None, 112, 112, 64) 0 _________________________________________________________________ conv2d_transpose_3 (Conv2DTr (None, 112, 112, 32) 18464 _________________________________________________________________ batch_normalization_3 (Batch (None, 112, 112, 32) 128 _________________________________________________________________ up_sampling2d_4 (UpSampling2 (None, 224, 224, 32) 0 ================================================================= Total params: 1,569,120 Trainable params: 1,568,160 Non-trainable params: 960 _________________________________________________________________
header = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(32, 3, padding="same", input_shape=(224,224,32)),
tf.keras.layers.Conv2D(21, 1, padding="same", activation='softmax')
], name='header')
header.summary()
Model: "header" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 224, 224, 32) 9248 _________________________________________________________________ conv2d_1 (Conv2D) (None, 224, 224, 21) 693 ================================================================= Total params: 9,941 Trainable params: 9,941 Non-trainable params: 0 _________________________________________________________________
model = tf.keras.models.Sequential([
vgg_encoder,
decoder,
header
])
model.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= vgg16 (Functional) (None, 7, 7, 512) 14714688 _________________________________________________________________ decoder (Sequential) (None, 224, 224, 32) 1569120 _________________________________________________________________ header (Sequential) (None, 224, 224, 21) 9941 ================================================================= Total params: 16,293,749 Trainable params: 1,578,101 Non-trainable params: 14,715,648 _________________________________________________________________
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.CategoricalCrossentropy(), #<- in this case, classes are one-hot encoded
metrics=[tf.keras.metrics.CategoricalAccuracy()],
)
model.fit(
ds_voc_trn,
epochs=5,
validation_data=ds_voc_val,
)
Epoch 1/5 32/32 [==============================] - 861s 26s/step - loss: 1.6911 - categorical_accuracy: 0.6471 - val_loss: 14.7482 - val_categorical_accuracy: 0.1532 Epoch 2/5 32/32 [==============================] - 38s 1s/step - loss: 0.6837 - categorical_accuracy: 0.8272 - val_loss: 3.8629 - val_categorical_accuracy: 0.5857 Epoch 3/5 32/32 [==============================] - 39s 1s/step - loss: 0.5317 - categorical_accuracy: 0.8422 - val_loss: 1.5206 - val_categorical_accuracy: 0.7351 Epoch 4/5 32/32 [==============================] - 39s 1s/step - loss: 0.4423 - categorical_accuracy: 0.8584 - val_loss: 0.7096 - val_categorical_accuracy: 0.8336 Epoch 5/5 32/32 [==============================] - 39s 1s/step - loss: 0.3936 - categorical_accuracy: 0.8706 - val_loss: 0.6458 - val_categorical_accuracy: 0.8335
<tensorflow.python.keras.callbacks.History at 0x7fae7c6fcdd0>
X, Y = next(ds_voc_val.as_numpy_iterator())
Y_hat = model.predict(X)
### due to VGG preprocessing, we need to recover our initial image.
# https://stackoverflow.com/questions/55987302/reversing-the-image-preprocessing-of-vgg-in-keras-to-return-original-image
def deprocess_img(processed_img):
x = processed_img.copy()
# perform the inverse of the preprocessiing step
x[:, :, 0] += 103.939
x[:, :, 1] += 116.779
x[:, :, 2] += 123.68
x = x[:, :, ::-1]
x = np.clip(x, 0, 255).astype('uint8')
return x
### let's inspect our masks to make sure they're generating properly
N = 5
fig, axs = plt.subplots(3,N,figsize=(N*2,N))
for ii, ii_r in enumerate(np.random.choice(X.shape[0],N)):
axs[0,ii].imshow(deprocess_img(X[ii_r,:,:,:]))
axs[1,ii].imshow(Y[ii_r,...].argmax(axis=-1), vmax=21, vmin=0) # use argmax to revert from one-hot to logit encoding
axs[2,ii].imshow(Y_hat[ii_r,...].argmax(axis=-1), vmax=21, vmin=0) # use argmax to revert from one-hot to logit encoding
plt.show()
Okay so it looks like we're doing okay with the class identification, but not the shape.
Let's try some U-Net-like bridging connections from low-level encoder features.
def make_model():
# don't need to get the last maxpool layer
encoder_output = vgg_encoder.get_layer('block5_conv3').output
# get the featuremaps from the encoder so we can bridge them to the decoder
block1_featuremap = vgg_encoder.get_layer('block1_conv2').output # 224x224
block2_featuremap = vgg_encoder.get_layer('block2_conv2').output # 112x112
# VGG doesn't have batch norm so let's do it ourselves:
block1_featuremap = tf.keras.layers.Activation('relu')(block1_featuremap)
block1_featuremap = tf.keras.layers.BatchNormalization()(block1_featuremap)
block2_featuremap = tf.keras.layers.Activation('relu')(block2_featuremap)
block2_featuremap = tf.keras.layers.BatchNormalization()(block2_featuremap)
# the decoder with bridging:
x = tf.keras.models.Sequential(UpBlock(128,2))(encoder_output) # 28x28
x = tf.keras.models.Sequential(UpBlock(64,2))(x) # 56x56
x = tf.keras.models.Sequential(UpBlock(32,2))(x) # 112x112
x = tf.keras.layers.Concatenate()([block2_featuremap,x]) # 112x112
x = tf.keras.models.Sequential(UpBlock(32,2))(x) # 224x224
x = tf.keras.layers.Concatenate()([block1_featuremap,x]) # 224x224
# add some header layers:
x = tf.keras.layers.Activation('relu')(x)
x = tf.keras.layers.Conv2D(32, 3, padding="same")(x)
output = tf.keras.layers.Conv2D(21, 1, padding="same", activation='softmax')(x)
return tf.keras.models.Model(vgg_encoder.input, output)
model = make_model()
model.summary()
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224, 224, 3) 0
__________________________________________________________________________________________________
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 input_1[0][0]
__________________________________________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 block1_conv1[0][0]
__________________________________________________________________________________________________
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 block1_conv2[0][0]
__________________________________________________________________________________________________
block2_conv1 (Conv2D) (None, 112, 112, 128 73856 block1_pool[0][0]
__________________________________________________________________________________________________
block2_conv2 (Conv2D) (None, 112, 112, 128 147584 block2_conv1[0][0]
__________________________________________________________________________________________________
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 block2_conv2[0][0]
__________________________________________________________________________________________________
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 block2_pool[0][0]
__________________________________________________________________________________________________
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 block3_conv1[0][0]
__________________________________________________________________________________________________
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 block3_conv2[0][0]
__________________________________________________________________________________________________
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 block3_conv3[0][0]
__________________________________________________________________________________________________
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 block3_pool[0][0]
__________________________________________________________________________________________________
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 block4_conv1[0][0]
__________________________________________________________________________________________________
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 block4_conv2[0][0]
__________________________________________________________________________________________________
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 block4_conv3[0][0]
__________________________________________________________________________________________________
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 block4_pool[0][0]
__________________________________________________________________________________________________
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 block5_conv1[0][0]
__________________________________________________________________________________________________
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 block5_conv2[0][0]
__________________________________________________________________________________________________
sequential_1 (Sequential) (None, 28, 28, 128) 738560 block5_conv3[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, 112, 112, 128 0 block2_conv2[0][0]
__________________________________________________________________________________________________
sequential_2 (Sequential) (None, 56, 56, 64) 111232 sequential_1[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 112, 112, 128 512 activation_5[0][0]
__________________________________________________________________________________________________
sequential_3 (Sequential) (None, 112, 112, 32) 27968 sequential_2[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, 224, 224, 64) 0 block1_conv2[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 112, 112, 160 0 batch_normalization_5[0][0]
sequential_3[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 224, 224, 64) 256 activation_4[0][0]
__________________________________________________________________________________________________
sequential_4 (Sequential) (None, 224, 224, 32) 55616 concatenate[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 224, 224, 96) 0 batch_normalization_4[0][0]
sequential_4[0][0]
__________________________________________________________________________________________________
activation_14 (Activation) (None, 224, 224, 96) 0 concatenate_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 224, 224, 32) 27680 activation_14[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 224, 224, 21) 693 conv2d_2[0][0]
==================================================================================================
Total params: 15,677,205
Trainable params: 961,109
Non-trainable params: 14,716,096
__________________________________________________________________________________________________
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.CategoricalCrossentropy(), #<- in this case, classes are one-hot encoded
metrics=[tf.keras.metrics.CategoricalAccuracy()],
)
model.fit(
ds_voc_trn,
epochs=5,
validation_data=ds_voc_val,
)
Epoch 1/5 32/32 [==============================] - 63s 2s/step - loss: 1.6185 - categorical_accuracy: 0.6113 - val_loss: 3.9719 - val_categorical_accuracy: 0.4744 Epoch 2/5 32/32 [==============================] - 54s 2s/step - loss: 0.7886 - categorical_accuracy: 0.7705 - val_loss: 1.6923 - val_categorical_accuracy: 0.6200 Epoch 3/5 32/32 [==============================] - 54s 2s/step - loss: 0.6383 - categorical_accuracy: 0.7999 - val_loss: 1.1082 - val_categorical_accuracy: 0.7022 Epoch 4/5 32/32 [==============================] - 54s 2s/step - loss: 0.5530 - categorical_accuracy: 0.8230 - val_loss: 0.9435 - val_categorical_accuracy: 0.7439 Epoch 5/5 32/32 [==============================] - 54s 2s/step - loss: 0.5729 - categorical_accuracy: 0.8243 - val_loss: 0.8681 - val_categorical_accuracy: 0.7499
<tensorflow.python.keras.callbacks.History at 0x7fae7fc3a550>
X, Y = next(ds_voc_val.as_numpy_iterator())
Y_hat = model.predict(X)
### let's inspect our masks to see how they're doing now
N = 5
fig, axs = plt.subplots(3,N,figsize=(N*2,N))
for ii, ii_r in enumerate(np.random.choice(X.shape[0],N, replace=False)):
axs[0,ii].imshow(deprocess_img(X[ii_r,:,:,:]))
axs[1,ii].imshow(Y[ii_r,...].argmax(axis=-1), vmax=21, vmin=0) # use argmax to revert from one-hot to logit encoding
axs[2,ii].imshow(Y_hat[ii_r,...].argmax(axis=-1), vmax=21, vmin=0) # use argmax to revert from one-hot to logit encoding
for jj in range(3):
axs[jj,ii].axis('off')
plt.show()
### Let's get the confusion matrix for the test set
cs = []
for X, Y in ds_voc_test:
Y_hat = model.predict(X)
C = confusion_matrix(Y.numpy().argmax(axis=-1).flatten(), Y_hat.argmax(axis=-1).flatten())
cs.append(C)
confusion_arr = np.array(cs).sum(axis=0)
fig, ax = plt.subplots(1,1, figsize=(12,12))
vmax = confusion_arr[1:,1:].max()
ax.imshow(confusion_arr, vmax=vmax)
ax.set_yticks(range(len(VOC_COLORMAP.keys())))
ax.set_yticklabels(list(VOC_COLORMAP.keys()))
ax.set_xticks(range(len(VOC_COLORMAP.keys())))
ax.set_xticklabels(list(VOC_COLORMAP.keys()), rotation=90)
plt.show()
Open a Google Colab notebook like this one: https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/quickstart/beginner.ipynb
From Runtime choose Change runtime type and under Hardware Accelerator select GPU. Congrats! You are now ready for GPU-enhanced deep learning!
Test your gpu setup with tensorflow:
import tensorflow as tf
tf.config.list_physical_devices()
!nvidia-smi
Tools -> Settings -> MiscellaneousWe've seen that designing a powerful machine learning system requires a combination of intuition, experimentation, and computational resource. How can we systematically track the ML experiments we run? How can we keep an eye on them as they progress?
Let's introduce two new libraries: sacred and tensorboard.
Sacred is an open-source machine learning experimentation framework that allows users to track their experiments and retain their configurations and results.
TensorBoard is a visualisation toolkit that allows users to watch their models training in real-time, and log and visualise training progress.
We may also want to write our own custom components, for example:
Let's see how we can write a custom ML experiment using Sacred and Tensorboard that does all these things.
NB: this tutorial should be run in a local or cloud Jupyter lab environment.
Setting up our directories for successfully tracking experiments:
root // repository root for version control
│ README.md // repo README with useage instructions
│ cli.py // good practise - you want an entrypoint for your project
| runner.py // (optional) abstract your experiment entrypoint away from your CLI
| conf.yml // you can use yaml for human-readable config files
|
└───myproject // where the actual code is kept
| | __init__.py // initialise your experiment here
| | train.py // code containing you training loop
| | ... eval.py, loss.py, etc. // you may want to breakout other modules for tidiness
| | main.py // code for running your ML curriculum
| └───models // you may want to breakout your model generating scripts
| | mymodel.py // keeps your ML models tidy
|
└───experiments // a directory for all your experiment data, gitignored
| └───sacred // experiments subdirectory for sacred experiment files
| └───tensorboard // experiments subdirectory for tensorboard experiment files
│
└───data // a directory for all your data, gitignored
cli.py -> command line interface, tells users how to use your project and allows cli execution, e.g. python cli.py mycommand --option=value
data/ and experiments/ can also be paths to mounted disks, usually at, e.g., /mnt/data/. Suitable for up-to-large ML projects (~5tb).
Sacred uses Experiments and observers to setup and track ML experiments. Sacred can capture writer outputs, and save files using artifacts
Observers allow you to mirror the same experiment metrics and results to multiple locations: file paths, Mongo DBs, AWS, GCP, and Azure cloud storage.
Explore this repo to see how to use these tools.